
A/7? S' £ 9- J-r 

AP- 



Technical Report TR-187 March 1972 


METHODS OF COMPUTING VOCABULARY 
SIZE FOR THE TWO-PARAMETER 
RANK DISTRIBUTION 


H . P . Edmunds on 
G. Fostel 

I. Tung 

W. Underwood 


(NASA-CR-1 29806) METHODS OF COMPUTING N73-13 

VOCABULARY SIZE FOR H.P. Edmundson, et 

al (Maryland Univ.) Mar. 1972 43 p 

CSC L 09B Unclas 

G3/08 50321 


UNIVERSITY OF MARYLAND 


COMPUTER SCIENCE CE 

COLLEGE PARK, MARYLAND 




Technical Report TR-187 


March 1972 


METHODS OF COMPUTING VOCABULARY 
SIZE FOR THE TWO-PARAMETER 
RANK DISTRIBUTION 


H . P . Edmunds on 
G. Fostel 

I. Tung 

W. Underwood 


This research was sponsored in part by the Office of Naval Research under 
contract N000l4-67-A-0239-0004, NR 0^9-261; in part by the National 
Aeronautics and Space Administration -under grant NGL-21-002-008 ; and in 
part by the National Science Foundation Science Development Program 
under grant GU-206l. Reproduction in whole or in part is permitted for 
any purpose of the United States Government. 



ABSTRACT 


This paper describes a summation method for computing the vocabulary 
size for given parameter values in the 1- and 2-parameter rank distri- 
butions. Two methods of determining the asymptotes for the f am ily of 
2-parameter rank-distribution curves are also described. Tables are 
computed and graphs are drawn relating pairs of parameter values to the 
vocabulary size. The partial product formula for the Riemann zeta function 
is investigated as an approximation to the partial sum formula for the 
Riemann zeta function. An error bound is established that indicates that 
the partial product should not be used to approximate the partial sum in 
calculating the vocabulary size for the 2-parameter rank distribution. 
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i. Introduction 

1.1 Background. 

This paper is a continuation of the research reported by Edmundson 
[1972]. That paper included a historical s umm ary of the controversy con- 
cerning the rank hypothesis. The rank hypothesis is based on the observation 
of the American philologist G. K. Zipf [1935 > 19^+9] that the relative frequency 
f of a word type of rank r is approximately a constant c times the reciprocal 
of its rank r. 

The model corresponding to Zipf's observation is that the probability 
of the occurrence of a word type of rank r is the product of a parameter c 
and the reciprocal of the rank of that word type. Hence the rank distribution 
formulated by Zipf has the density function 


p = cr 
•^r 


-1 


c > 0 


for r = 1,. .. ,v. where, v is the theoretical vocabulary size. 

The American linguist M. Joos [1936] observed that empirical data is 
not adequately fitted by Zipf’s rank distribution, especially at the extremes 
where the rank is either very high or very low. Joos introduced a second 
parameter b as the exponent of the rank r. Thus the rank distribution formu- 
lated by Joos has the density function 

p = cr b b > 1 , c > 0 

r — 

for r = 1,,. . . ,v. Let the cumulative distribution function by denoted by 


F r = l p k 

r k=l K 


Since F = 1, it follows that 
v 



r=l 
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Note that the above equation is of the form <}>(v,b,c) = 0 and hence implies 
that v is a function of b and c. 

1.2 Purpose 

The purpose of this paper is to present several methods for computing the 
vocabulary size v, given values of the parameters b and c in the 2-parameter 
rank distribution. The linguistic motivation for this mathematical research is 
to provide linguists with a parameterized family of curves that will permit 
them to do the following: 

(1) given any two of the three quantities v, b, and c, find the third. 

(2) given any one of the three quantities v, b, and c, find the. set of 
all possible pairs of the remaining two. 

Of these possibilities perhaps the most linguistically interesting are the 
following: 

(a) assuming a given vocabulary size v, find a pair of parameter values 
b and c that are linguistically satisfactory. 

(b) assuming fixed values of the parameters b and c, compute the 
resulting vocabulary size v. 

(c) assuming given values of the vocabulary size v and the parameter c, 
compute the resulting value of the parameter b. 

(d) assuming given values of the vocabulary size v and the parameter b, 
compute the resulting value of the parameter c. 


1. 3 Scope 

The remainder of this paper presents several methods of computing the 
vocabulary size v, given values of the parameters b and c. Section 2 discusses 
a direct summation method of calculating v for the .2-parameter rank 
distribution. Section 3 discusses a method for computing vocabulary 
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size using a finite product involving primes. Section 4 presents two 
methods for determining asymptotes to the rank-distribution curves. This 
section contains, as the major result of the paper, a graph of the parame- 
terized family of curves together with their asymptotes. 

1.4 Results 

Tables have been computed and graphs have been drawn for v satisfying 
the equation 

v 

<|>(v,b,c) = c £ r -1 = 0 
r=l 

.ifor certain values of the parameter b in the interval 0.90 to l.lU and the 
parameter c in the interval 0.05 to 0.15. Asymptotes to the curves repre- 
senting v vs. b have been determined for each value of c. A good error bound 
has been derived for the partial product formula for the Riemann zeta function 
as an approximation to the partial sum formula for the Riemann zeta function. 

More extensive results covering approximation formulas for the vocabu- 
lary size for the 1-, 2-, and 3-parameter rank distributions are given in 
Edmundson et al. [1972]. 
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2. SUMMATION METHOD 

2.1 Program for the Summation Method 

The most straight-forward way to solve for v, given b and c in the 2-para- 
meter rank distribution where 



r=l 


is to add a sufficient number of terms until the sum multiplied by c first 
exceeds 1. The number v* of terms summed will be regarded as an approxima- 
tion of the exact value v. 

The values initially proposed for consideration were b = 0.90, 0.95, 

0 . 99 ( .01)1.20 and c = 0 .05 ( .01)0 . 15 . Later, it was. decided advisable to look 
at the fine structure in the range c = 0 . 065 ( 0 . 001 ) 0 .100 when b = 1.00. 
However, v was not computed for all proposed values of b and c since either 
(l) the computation time is known to be excessive or (2) no such value of v 
exists. (See Section U on asymptotes.) 

An ALGOL program for the summation method is presented in Fig. 1. In 
this program b and c are the parameters of the implicit function f, r is the 
iterated variable, t is the reciprocal of r to the power b, log(v) is the 
common logarithm of v, s is the double-precision sum of the terms t, and q is 
the product of c and s. A value of b is read and c is initialized to 0.15- 
The program iterates through the loop, increasing r and computing q, until 
q exceeds 1.0. The value of r after q exceeds 1.0 is regarded as the value 
of v with respect to the parameters b and c. The common logarithm of v is 
computed to facilitate graphing the relationship of b , c, and v. The values 
of c, v, log^gV, t, and q are then outputted. 

The addition of terms t to form s causes some complication in this 

program. The UNIVAC 1108 computer used for these computations allows preci- 

7 

sion of up to 9 significant decimal digits. As r increases to the order 10 , 
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-7 

t is of the order 10 . When s becomes greater than 10, adding numbers of the 

-T 

order 10 to s would be meaningless on this computer. Therefore, s . and q 
have been chosen to be double-precision variables, allowing l8 significant 
decimal digits for each. Double precision was not used for other variables 
to save computation time in arithmetic operations , especially for exponen- 
tiation. 


begin comment summation method; 
real b ,c ,r ,t ,v; 
real procedure log(x) ; 
real x; 

log:=0.1+3429H48*ln(x) ; 

comment use double precision for s and q; 
real 2 s ,q; 

format val(LR15.8,R25.l8,A1.0) ; 

read (b) ; 

s:=0.0&&0; 

r:=0.0; 

for c:=0.15 step -.01 until 0.05 do 
begin 

loop : r : =r+l ; 

t:=r**(-b); 
s :=s+t ; 
q:=c*s ; 

if q<1.0&&0 then go to loop; 
v:=r; 

write ( val ,c ,v ,log( v ) ,t ,q) 
end 

end 


Figure 1. ALGOL Program for Summation Method. 


Instead of computing the sum for each value of c, considerable computer 
time is saved by the following procedure. For fixed b the c's are arranged 
in decreasing order. When the sum (multiplied by c) first exceeds 1.0, the 
calculation for the next smaller c may be started by using the current partial 
sum instead of restarting from its first term. 

The computation time for each term in the sum has been found to be 
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approximately 80 microseconds. The computation time for v is directly 
proportional to the size of v with a proportionality constant of 80 micro- 
seconds. For example, the value v = 898,515 calculated for b = 1.00 and 
c = 0.07 took approximately 70 seconds to compute on the UNIVAC 1108. 

2.2 Sample Output and Graph 

The sample output in the case b = 1.0 is tabulated in Fig. 2 and its 
graph is plotted in Fig. 3- The outputted values v, t, and q are respectively 
those values of r, t, and q immediately after q has exceeded 1.0. Therefore v 
is the number of terms in the sum and the variable t is the last term in the 
sum, that is . 

. -b 
t = v 

The table does not contain values of c less than 0.07 because the run was 
stopped after 75 seconds of execution. 


c 

V 

H 

O 

CK3 

H 

O 

<1 

t 

9 

0.15 

44i 

2.61U+385E+00 

2.2675737E-03 

1 . 00010907172776739D+000 

0.l4 

710 

2.8512583E+00 

1.U08U507E-03 

1.0000U58U500300125D+000 

0.13 

.1230 

3.0899051E+00 

8.1300813E-01 

1 . 000010 89167823920D+000 

0.12 

2336 

3.368U728E+00 

4.2808219E-04 

1 . 00003499380 A29624D+000 

0.11 

4983 

3.697^908e+00 

2.0068232E-04 

1 . 00002136503591327h+000 

0.10 

12367 

4.0922642E+00 

8. 086035^-05 

1 . 00000429331210616D+000 

0.09 

37568 

4. 5748180E+00 

2.6618399E-05 

1 . O00OO231334124586IH000 

0.08 

150661 

5.1780007E+00 

6.637 1 +178E-06 

1 . 00000052021645891D+000 

0.07 

898515 

5-9535252E+00 

1.1129U75E-06 

1.00000004305938254D+000 


Figure 2. Computer Results for Summation Method for b = 1.00. 
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2.3 Tables and Graph of the Results 

The table of log^v for certain values of b and c may be found in Fig. 
U. More comprehensive tables are given in Appendix A for c at intervals of 
0.01. For b = 1.0 the fine structure is given in Appendix B for c at inter- 
vals of 0.001. 

The family of curves relating the values log^v , b , and c is presented 


in Fig . 5 . 
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3. THE TWO-PARAMETER RANK DISTRIBUTION AND THE RIEMANN ZETA FUNCTION 

3. 1 The Partial Sum. and Partial Product Formulas for the Riemann Zeta Function 

6 

Since values of v greater than 10 could not be computed within reasonable 
computation times (as indicated in Fig. 4), another method for computing the 
vocabulary size must be found. Note that the function 


f(v,b) = j r b 
r=l 

derived from the 2 -parameter rank distribution is actually the partial sum of 
the Riemann zeta function defined by 


4(b) = l r 
r=l 


-b 


b > 1 


One of the most important theorems concerning the Riemann zeta function is 


-b x -l 


4(b) = IT (1 - P k ) 
k=l * 


b > 1 


■where p is the k-th prime number (see Apostol [1957, p. 389]; Jahnke, .Emde, 

XV 

and Losch [i960, p. 371). Let 


n 


-b 


s = l r 

• n r=l 

denote the n-th partial sum of the Riemann zeta function and let 

p n = TT (i - p^V 1 

n k=l 

denote the n-th partial product of the Riemann zeta function. Because of the 
sparseness of prime numbers , consideration has been given to approximating the 
partial sum by the partial product. 

For this approximation it is desirable to derive a bound on the difference 
between the partial product and the partial sum. Since 

(l - x) - '*" - 1 + x + x^ t x^ + 
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for | x | <1, the partial product P n may be written as 


n 


r i ■ . — b\— 1 r r , — b —2b —3b, \ 

Tf (i - Pv ) = TT (i + Pv + p v + p> + •••) 


k=l 


k=l 


, -b -2b , , -b . -2b , s 

= (1 + P 1 + P x +•••)*** (1 + P n + P n + •••) 


After multiplication all terms are of the form 


-e,b -e.b -e n b 

P 1 1 p 2 2 P n 


where the are non-negative integers for i = l,...,n. Therefore the 

partial product may be expressed as the sum of all such terms 

n 


TT (1 - P k b ) 1 = l I •** l P. S l b P 2 S 2 b ••• p " e n 

^ n _ r\ _ r\ * -L H 


co oo 


-e, b -e„b -e b 


k=l 


e =0 e =0 e =0 
12 n 


Since for every prime p^ every positive integer r < p n can be expressed as 

e-i ep en 

r = p. i p 0 ^ • • • p n 
1 2 n 


for some integers e. > 0 where i = l,...,n, it follows that 


l - 


l r~ b < l I I Pl -'l D p- e 2 

r=l e =0 e =0 e =0 ^ 

12 n 


-e,b -e„b -e b 

• • • p n 

n 


Since by definition 


n 


r- b - It (i - p,,- 6 )- 1 


?(b > - P n= l- 

r=l k=l 


it follows that 


x, n 
-b r -b 


?(b) - P n 5 l r - l 


r l t 

r=l r=l r=p +1 

n 


-b 
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Thus 


0 < 5(b) - P R < l 


-b 


r=p +1 
n 


r* — "b . 

Multiplying by -1 and adding the term l r throughout , it follows that 


r=p +1 
n 


o < p - s < y 

- n p L 


-b 


n r=p +1 
n 


Since 


r — b c 00 — b , 

L r t J v x 

r=p +1 F n 

n 


1-b 


b-1 


b > 1 


the bounds for the difference between the partial product and the partial sum 
of the Riemann zeta function may be given by 


1-b 


( 3 . 1 ) 


n 


0 < P - S < . n 

- n p - b-1 

n 


For example, if b 
of the order 10 ^ . 


2.0 and p^ = 10 , then ( 3 - 1 ) gives an error bound 

Since S > 1, the relative error bound is 
p n - 


P - S 

n p 


n 


1-b 


■ n 


< 10 


-6 


Hence for values of b and p^ of these magnitudes or larger, the partial 

product P is a good approximation to the partial sum S 

n P n 



On the other hand, if b = 1.1 and = 10 , then ( 3 . 1 ) gives an 

error bound of approximately 2.5. Since S < £(b) = 10.584, the relative 

^n 

error bound is approximately 1/4. Hence, for values of b close to 1, the 

upper bound is too loose to approximate the difference. To estimate this 

difference better, the values of P and S will be calculated directly. 

n p 

n 


3.2 Comparison of the Partial Sum and Partial Product 

This section is devoted to the calculation of the partial sum S and the 

n 

partial product P^ for b in the interval (l.O, 1.2]. One problem with the 

latter calculation is the need to generate primes. The prime number generator 

presented by Chartres [ 1967 ] is used here to generate prime numbers less than 

60,000. It has been rewritten in FORTRAN and appears in Appendix D. With 

these prime numbers the partial product P^ may be calculated by multiplying 

factor by factor. Graphs comparing the partial sums S n and partial products 

P for b = 1.0, 1.1, and 1.2 are shown in Figs. 6, 7> and 8, respectively, 
n 

Tables for these data points are given in Appendix C. For b = 1.0 in Fig. 6, 
the graphs of P^ and appear to diverge and then converge. For' b = 1.2 in 
Fig. 8, the partial product is a relatively good approximation to the partial 
sum. However, the main concern in this research is for b in the interval 
(l.O, 1.2]; even though the vocabulary size is undefined for b close to 1.2 ' 
in the chosen range of the parameter c, as is explained in Section 4 below. 

It should be recalled that this research is concerned with finding the 
number of terms summed (that is, the vocabulary size), rather than the sum 
itself. Despite the fact that there may be a small difference between the 
partial sum and the partial product P , there may still be a great dif- 
ference between the number of terms summed in the partial Siam and the largest 
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prime p^ in the -partial product. For example, in the case b = 1.1, if p^ 
59,887, then P = 8.78 and S' = 7-25, giving a difference of only 1.53. 

* n "P 

n 

However, P exceeds the value 7-25 when p = 1,009, while S exceeds this 
n p 

n 

value when p n = 59,887. Therefore, the partial product should not be used 
to approximate the partial sum in calculating the vocabulary size for the 
2-parameter rank distribution. 
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4. ASYMPTOTES OF THE RANK-DISTRIBUTION CURVES 
4.1 Graphical Significance 

In Section 2 the family of curves of v vs. b with c as a parameter was 
studied by investigating the implicit function 

v -b 

<J>(v,b,c) = c l r -1 = 0 
r=l 

There, the intervals of interest were [l.O, 1.2] for b and [0.05, 0.15] for 
c. Since the series 


00 



r=l 


converges for b > 1, values of v do not exist that satisfy <j>(v,b,c) = 0 
for those values of c such that 

1/c > l r _b 
r=l 

For fixed c, v tends to infinity as b increases. Therefore it is of interest 
to find the values of b that yield the asymptotes for these curves. 

Since, for b > 1, v increases as b increases, the asymptotes will be 
the vertical lines b = b* where b* satisfies 

1 = c £ r b = c j;(b*) 
r=l 

That is , for each value of c the value b* must be found such that 
(4.1) ' ^(b*) = 1/c 

Unfortunately, tables for the Riemann zeta function cannot be found that 
permit the calculation of b* for c = 0.05(0.01)0.15- For example, £(b) 
jumps from 10.584 to ® as b goes from 1.1 to 1.0. Thus it is impossible to 
interpolate intermediate values of ?(b*). 



i8 


Two methods are suggested here for determining the asymptotes. It 
turns out that they give similar values . Both of these methods are based on 
the graph of the curve of £(b) - which is tabulated in Fig. 9 and 
plotted in Fig. 10; see also Walther [1926, p. 396] for a previous plot of 
this difference. The values ?(b) are given in Dwight [1961]. 


b 

1 

b-1 

5(b) 

«<»> 

1.1 

10.00000 00 

10. 581+1+1+ 85 

0 . 581+41+ 85 

1.2 

5.00000 00 

5.59158 21+ 

0.59158 2l+ 

1.3 

3.33333 33 

3.93191+ 92 

0.59861 59 

1.1+ 

2.50000 00 ' 

3.10551+ 73 

0.60551+ 73 

1.5 

2.00000 00 

2.61237 53 

0.61237 53 

1.6 

1.66666 67 

2.28576 57 

0.61909 90 

1.7 

1.1+2857 ih 

2.051+28 88 

0.62571 71+ 

1.8 

1.25000 00 

1.88222 96 . 

0.63222 96 

1.9 

1.11111 11 

1.71+971* 61+ 

O.63863 53 

2.0 

1.00000 00 

1.61+1+93 l+l 

0.641+93 1+1 

2.5 

0.66666 67 

1.31+11+8 73 

0.67482 06 

3.0 

0.50000 00 

1.20205 69 

0.70205 69 

3.5 

0.1+0000 00 

1.12673 39 

0.72673 39 

1+.0 

0.33333 33 

1.08232 32 

O.7I+898 99 

1+.5 

0.28571 1+3 

1.051+70 75 

0.76899 32 

5.0 

0.25000 00 

1.03692 78 

0.78692 78 

5.5 

0.22222 22 

1.02520 1+6 

0.80298 2l+ 

6.0 

0.20000 00 

1.01731+ 31 

0.81731+ 31 

6.5 

0.l8l8l 82 

1.02100 59 

0.83018 77 

7-0 

0.16666 67 

. 

1.00834 93 

0.81+168 26 


1 


Figure 9- Table of Values of £(b) - 


b-1 * 








0 1 2 3 U 5 6 T 

Figure 10. Graph of c(b) - . 
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4.2 Const ant -value Method 

The constant-value method assumes that the value c(b) — is nearly 

b— 1 ^ 

constant when b is close to 1. This is confirmed by observing Fig. 10 for b 
in the interval (1.0, 1.2]; for example. 


e(l.l) - = 0.584 

and 

4(1-2) - — — = 0.592 

Let 

(4.2) a = ?(b) - — - 


Thus b* must satisfy both (4.1) and (4.2) arid hence must satisfy 

1 


(4.3) 


b* = 


+ 1 


l/c - a 

Because (l.O, 1.2] is the interval of b under consideration, the mid- 
point b = 1.1 is chosen. For this point, a = 0.584 448 464 since £(l.l) = 


10.584 448 464. 

Fig. 11 is a table of the asymptotes b = b* given by (4.3) 


c 

b* 

0.05 

1.051 505 

0.06 

1.062 180 

0.07 

1.072 986 

0.08 

1.083 924 

0.09 

1.094 997 

0.10 

1.106 207 

0.11 

1.117 558 

0.12 

1.129 051 

0.13 

1.140 689 

0.l4 

1.152 476 

0.15 

1.164 4i4 


Figure 11. Asymptotes Obtained by Const ant -value Method. 
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4. 3 Straight-line Method 

As a generalization of the const ant -value method, the straight-line 
method assumes that the graph of 5(b) - is close to a straight line when 

b is close to 1. 

Let 

g(b) = 5 (b) - 

Under the assumption that g(b) is a straight line, g"(b) = 0. Hence it 

follows from Taylor’s formula that 

(4.1+) g(b) = g( a) + g ' ( 0 ) (b-a) 

where a is some given point and 0 is some point between b and a. 

Again, a = 1.1 is chosen as the given point. Since it is assumed that 
g'(b) is constant, the value of g'(0) may be calculated as follows: 

g'(0) = ~ = °‘° T1 339 763 ■ 

Thus b* must satisfy both (4.1) and (4.4) and hence b* must satisfy 

5-bSrr= c(1 'i) ‘ idui + s ' (e,(b * - l-i) 

or, equivalently, b* must satisfy 

(4.5) Ab* 2 + Bb* + C = 0 

where 

A = g’ (0) = 0.0T1 339 763 
B = 5(l.l) - 2.1 g ’ ( 0 ) - i - 10 

C =.i - 5(1.1) + 1.1 g ’ ( 0 ) + 11 

Fig. 12 is a table of the asymptotes b = b*, given by solving (4.5)* 
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c 

b* 

.0,05 

— — 

1.051 496 

o .06 

1.062 171 

0.07 

1.072 976 

0.08 

1.083 916 

0.09 

1.09k 99b 

0.10 

1.106 213 

0.11 

1.117 575 

0.12 

1.129 085 

0.13 

l.l4o 7^7 

0.l4 

1.152 563 

0,15 

1.164 538 


Figure 12. Asymptotes Obtained by Straigt-line Method. 


Note that the above values of b* agree with those in Fig. 11 to 3 decimal 
places. On the other hand, values of b are considered only in increments 
of 0.01. Hence the constant-value method is good enough for determining the 
asymptotes b = b* for various values of c. 

The asymptotes of the parameterized family of curves 4>(v,b,c) are plotted 
in Fig. 13. 


O 





Figure 13. Parameterized Family of Vocabulary Curves and Their Asymptotes. 
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5. SUMMARY 

The major result of this paper has been the computation of the vocabu- 
lary size v, given the values of the linguistic parameters b and c, which 


appear in the 2-parameter rank, distribution 

-b 

P r = cr 


b >_ 1 , c > 0 


for r = l,...,v. This result provides linguists with a parameterized family 

of curves, shown in Fig. 5, which will permit them to do the following: 

(1) given any two of the three quantities v, b, and c, find the third 

(2) given any one of the three quantities v, b, and c, find the set of 

all possible pairs of the remaining two. 

Assume for the sake of example that the 130,000 entries contained in Webster's 
Seventh New Collegiate Dictionary [1967] represent the vocabulary size v 
of English. Then from Fig. 5 it may be seen that any one of the following 
pairs of values of the parameters b and c will yield this value v = 130,000: 
(1.02, 0 . 09 ) j (l.04, 0.10), and (l. 06, 0.11). 

A second result of this paper has been the determination of values of 
the parameters b and c for which v is undefined. These values are represented 
in Fig. 13 as asymptotes to the family of vocabulary -size curves. The two 
methods used to determine these asymptotes yield very close results. Hence 
the simpler constant-value method suffices. 

Finally, an error bound has been determined for the partial product of 
the Riemann zeta function as an approximation to the partial sum of the 
Riemann zeta function. For values of the parameter b considered in this 
research, the error bound indicates that the partial product is a poor 
approximation of the partial sum. However, for other values of the parameter 
b, the approximation is good. 
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Comprehensive tables of the vocabulary size v for the 2-parameter 
rank distribution are given in Appendices A and B. 
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Table of Vocabulary Size: Gross Structure for Various Values of b 
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Table of Vocabulary Size: Fine Structure when b = 1.0 
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Table of Partial Sums and Partial Products of the Riemann Zeta Function 
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1 .230 
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.007 
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039 
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1 .C 

9 


7 

C 
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TORTRAH Program 


Appendix D. 


for the Prime 


Number Generator 



COMMENT PRIME NUMBER GENERATOR 

INTEGER PRIMES ( 10000) >0(100) »DQ( 100) 

LOGICAL LT 

C THIS IS THE UPPER LIMIT OF THE PRIMES TO BE GENERATED 
L=6000G 
J=2 
K=2 

PRIMES ( 1 ) =2 

PRIMES<2)=3 

Q(2)=9 

DQ(2)=6 

DO 1 N=5»L,2 

LT=.TRUE. 

DO 2 1=2 * J 

IF (N,N£«Q ( I ) ) GO TO 2 
Q(I)=N+DQ(I) 

LT=. FALSE, 

IF (I.NE.J) GO TO 2 
<J— J+ 1 

Q( J)=PRIMES( J)**2 
DQ( J)r2*PRIMES( J) 

GO TO 1 
2 CONTINUE 

IF ( .NOT .LT) GO TO 1 
K-K+l 

PRIMES ( K ) =N 
KS=K-9 

IF ( (K/ 10 )* 10 .EQ.K) PUNCH 100 » (PRIMES! I ) r IrKS.K) 
1^0 FORMAT (1018) 

1 CONTINUE 
END 
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